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Proximal  Minimization  Algorithms 

with 

Cutting  Planes 


Siriphong  Lawphongpanich* 

Department  of  Operations  Research 
Naval  Postgraduate  School 
Monterey,  California  93943 

December,  1991 
Abstract 

This  paper  examines  a  class  of  proximal  minimization  algorithms  in  which  the  ob¬ 
jective  function  of  the  underlying  convex  program  is  approximated  by  cutting  planes. 
This  class  includes  algorithms  such  as  cutting  plane,  cutting  plane  with  line  search  and 
bundle  methods.  Among  these  algorithms,  the  bundle  methods  cam  be  viewed  as  a 
quadratic  counterpart  of  the  cutting  plane  algorithm  with  line  search,  for  they  both 
attempt  to  decrease  the  true  objective  function  at  every  iteration.  On  the  other  hand, 
the  cutting  plane  algorithm  does  not  explicitly  and/or  directly  attempt  to  decrease  the 
true  objective  function.  However,  it  relies  on  the  monotonicity  of  the  approximating 
function  to  guarantee  convergence  to  an  optimal  solution.  This  prompts  the  question  of 
whether  there  exists  a  quadratic  counterpart  for  the  cutting  plane  algorithm.  To  provide 
an  affirmative  answer,  this  paper  constructs  a  new  convergent  algorithm  which  resem¬ 
bles,  but  different  from,  the  bundle  methods.  .Also,  to  make  the  relationship  between 
bundle  methods  and  proximal  minimization  more  concrete,  this  paper  also  supplies  a 
convergence  proof  for  a  variant  of  the  bundle  methods  which  utilizes  analysis  common 
to  proximal  minimization. 

*This  research  was  supported  by  grants  from  National  Science  Foundation  (DDM-8814499) 
and  the  direct  funding  program  at  the  Office  of  Naval  Research  and  Naval  Postgraduate  School 
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1.  Introduction 


This  paper  proposes  an  application  of  the  proximal  minimization  algorithm  for 
the  following  problem. 

D  :  £(u*)  =  min£(u) 

ii€l/ 

where  C{u)  is  convex  and  f/  is  a  compact  subset  of  In  particular,  U  is  assumed 
to  be  a  polyhedral  of  the  form  {u  :  Au  <  b  and  u  G  R”'},  where  A  is  &  p  x  m 
and  6  is  a  vector  in  R^.  To  simplify  the  presentation  and  motivate  applications 
to  Lagrangian  duality  and  variational  inequalities,  the  objective  function  £(u)  is 
also  assumed  to  have  the  following  form: 

£(u)  =  max{/(x)  +  u-5(x)}  (1) 

where  X  is  a  compact  subset  of  /?”,  f{x)  is  a  real-valued  function  on  i?",  and 
g{x)  is  a  vector-valued  function  mapping  i?”  to  R^.  The  notation  a  ■  b  denotes 
the  usual  dot  product  between  two  vectors,  a  and  6. 

When  U  is  taken  to  be  the  (noncompact)  set  {u  :  u  >  0  and  u  €  /?’"},  D  is 
simply  the  Lagrangian  dual  problem  of  the  following  nonlinear  program: 

R  :  fi^’)  =  max  /(x) 

X 

s.t.  g(x)  >  0 

X  e  X. 

Under  an  additional  assumption  that  there  exists  an  x  such  that  g{x)  >  0,  the 
solution  to  P  can  be  obtained  by  solving  D  with  U  =  {u  :  0  <  u  <  M  and  u  6 
i?”*},  where  M  is  sufficiently  large. 

On  the  other  hand,  when  /(x)  =  F{x)  ■  x  and  ^(x)  =  F(x),  where  F{x)  is  a 
continuous  mapping  from  /Z"*  into  itself  and  satisfies,  for  some  a  >  0, 

(F(u)  —  F(x))  •  (u  —  x)  >  a||u  —  x||*,  Vu,x  G  U, 

then 

£(u)  =  max{-F(x)  •  (x  -  u)} 
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and  D  becomes 


minmaxi— F(i)  •  (x  —  u)}  =  —  maxmin{F(i)  •  (x  —  u)}. 

t*el/  l€(/  ^  V  /  V  /]  I  V  /  V  )} 

Hearn  et  al.  (1984)  referred  to  the  problem  on  the  right  as  the  dual  of  the  formu¬ 
lation  based  on  the  gap  function  for  the  following  variational  inequality: 

Find  u*  €  {/  such  that  F(u*)  •  (x  —  u*)  >0,  V  x  E  f/. 

For  the  remainder,  it  is  convenient  to  simply  refer  to  C{u)  tis  the  dual  function. 

To  solve  Z),  the  proximal  minimization  algorithm  (see,  e.g.,  Bertsekas  and 
Tsitsiklis,  1990,  Martinet,  1970,  and  Rockafellar,  1976)  generates  a  sequence  of 
points  in  U  by  the  iteration 

=  argmin{£(u)  +  ^tlu  -  u'-f }  ^=1,2,...  (2) 

where  v}  is  a  starting  point,  ||  •  |1  denotes  the  Euclidean  norm  and  is  a  sequence 
of  positive  numbers  with 

liminf  ct  >  0. 

k—oo 

Although  the  above  iterative  process  converges  to  an  optimal  solution  of  D,  ther*" 
is  a  concern  regarding  its  practicality.  Bertsekas  and  Tsitsiklis  (1990)  pointed  out 
in  their  book  that  the  proximal  minimization  algorithm  requires  solutions  to  a 
sequence  of  problems  instead  of  just  one  problem.  When  C{u)  is  nondifferentiable, 
this  concern  is  more  acute.  Adding  the  ‘proximal’  term  only  makes 

the  objective  function  of  the  problem  in  (2)  strictly  convex.  So,  when  C{u)  is 
nondifferentiable,  the  objective  function  in  (2)  is  still  nondifferentiable  and  solving 
a  sequence  of  nondifferentiable,  but  strictly  convex,  does  not  appear  as  attractive 
as  solving  only  one  nondifferentiable  problem  that  may  not  be  strictly  convex. 

To  make  proximal  minimization  more  amenable  to  £),  this  paper  approximates 
C{u)  in  (2)  by  the  following  function: 

C{u)  ss  L^{u)  =  max  {/(x')  -t-  u  ■  ^(x')} 

i=l,...,k 

where  x’  G  X.  When  x'  is  chosen  appropriately,  T*'(u)  is  simple  a  maximum  of  a 
finite  number  of  hyperplanes  tangential  to  £(u).  These  hyperplanes  are  generally 
known  as  cuts  or  cutting  planes. 


4 


To  unify  the  above  scheme  with  other  algorithms  that  use  cutting  planes,  this 
paper  describes  in  the  next  section  a  generic  algorithm  which  combines  cutting 
planes  with  proximal  minimization.  From  this  generic  algorithm,  three  algorithms 
from  the  literature  can  be  derived;  they  are  the  cutting  plane  algorithm,  the 
cutting  plane  algorithm  with  line  search  and  the  family  of  bundle  methods.  Among 
these  algorithms,  the  bundle  methods  can  be  viewed  as  a  quadratic  counterpart  of 
the  cutting  plane  algorithm  with  line  search  or  vice  versa,  i.e.,  the  latter  is  a  linear 
counterpart  for  the  former.  This  prompts  the  question  of  whether  there  exists  a 
quadratic  counterpart  for  the  (plain)  cutting  plane  algorithm.  The  results  in  this 
paper  provide  an  affirmative  response  to  the  question. 

For  the  remaining.  Section  2  formally  states  the  generic  algorithm  and  derives 
from  it  the  three  algorithms  in  the  literature.  Also  derived  is  the  new  algorithm 
which  is  a  quadratic  counterpart  of  the  cutting  plane  algorithm.  Section  3  pro¬ 
vides  convergent  results  for  the  new  algorithm.  To  establish  a  closer  relationship 
between  proximal  minimization  and  bundle  methods.  Section  4  provides  a  conver¬ 
gence  proof  for  a  simple  version  of  the  latter  which  is  different  from  those  in  the 
literature  and  uses  analysis  common  to  proximal  minimization.  Finally,  Section  5 
concludes  the  paper. 
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2.  Classification  of  Algorithms 


To  classify  and  establish  relationships  among  algorithms,  we  first  state  a  generic 
algorithm  and  then  show  how  it  can  be  specialized  to  the  four  algorithms.  Three 
of  the  four  exist  in  the  literature  and  the  last  is  new  and  shown  to  be  a  quadratic 
counterpart  of  the  cutting  plane  algorithm. 

A  GENERIC  ALGORITHM 


Step  1;  Select  u*  €  U.  Set  k  =  l.i>’  =  and 

x’  =  argmax{/(x)  +  u' 

r6A 

Step  2  :  Solve  the  master  problem 

=  argmin{L*(u)  +  ^||u  - 

If  u*  also  solves  the  problem,  stop  and  v'‘  solves  D. 

Step  3:  Solve  the  subproblem 

x''^'  =  argmax{/(x)  +  •  (7(1)} 

Xfe.\ 

Note  that  £(«*■*■*)  =  /(x*'"^*)  +  u*'’*'*  •  ^(x*"*"’). 

Step  4:  Derive  €  U  from  and  v*'  using  some  process  and/or  criteria 
(see  discussion  below).  Set  k  =  k  +  I  and  return  to  Step  1. 


Note  that  the  (master)  problem  in  Step  2  is  slightly  different  from  the  one  in 
equation  (2)  of  the  previous  section.  The  ‘prox-center’  in  the  proximal  term  is  u* 
for  the  master  problem  and  it  is  u*'  for  the  problem  in  (2).  In  addition,  the  master 
problem  in  Step  2  can  be  stated  as 


MP  : 


min  w  + 
s.t. 


1 

2^ 


u  —  V 


k\\2 


W  >  fix')  +  u  ■  g(x') 
Au  <  b 


=  l,...,fc 
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where  the  first  k  constraints  are  general!}'  referred  to  as  cuts  or  cutting  planes. 
The  dual  of  MP  can  be  written  as 

MD:  max-y!|G7r  + A‘A|p  + 

k 

S.t.  ^TT,  = 

«=1 

> 

A,  > 

where  /  denotes  a  vector  in  with  /(x’)  as  its  components,  G  denotes  a  m  x  Jt 
matrix  with  g{x')  as  its  columns,  tt,  are  the  dual  variables  corresponding  to  the 
cutting  plane  constraints  and  Aj  are  dual  variables  corresponding  to  the  con¬ 
straints  defined  by  the  matrix  A.  In  any  case,  both  the  master  problem  and  its 
dual  can  be  solved  in  a  finite  number  of  iterations.  Pang  (19S3)  and  Lin  and 
Pang  (1987)  reviewed  a  large  number  of  algorithms  applicable  to  both  MP  and 
M D.  More  specifically,  Kiwiel  (1991 )  designed  a  dual  algorithm  to  solve  M P  and 
Bertsekas  (1982)  proposed  an  efficient  algorithm  designed  especially  for  convex 
programming  problems  with  simple  constraints  such  as  those  in  MD. 

Below,  we  describe  four  specializations  of  the  generic  algorithm.  They  are 
the  cutting  plane  algorithm,  the  cutting  plane  algorithm  with  line  searches,  the 
bundle  methods  and  a  new  algorithm  called  the  proximal  minimization  algorithm 
with  cutting  planes. 

The  cutting  plane  (CP)  algorithm:  The  generic  algorithm  reduces  to  the 
CP  algorithm  when  c*  =  oo  and  =  ?/*''*'*  V  k.  First,  setting  =  cc  makes 
the  proximal  term  vanishes  from  the  objective  function  of  the  master  problem  in 
Step  2,  thereby  reducing  it  to  the  following  linear  program: 

ML  :  min  w 

s.t. 

w  >  f(x')  +  u-g{x'}  i  =  l,...,k 
Au  <  b 

Without  the  proximal  term  and  always  setting  ,  the  variable  v  be¬ 

comes  superfluous  and  can  be  eliminated  from  the  algorithm  entirely.  This  reduces 
Step  4  to  simply  increment  k  by  one.  It  can  be  shown  that  the  stopping  rule  in 


(G‘i.*  +  /)  •  TT  +  [Av'^  -  6)  ■  A 
1 

0,  i  =  I, . . .  ,k 
0  ;  =  l....,p 


Step  2  of  the  generic  algorithm  is  equivalent  to  the  one  typical  for  the  CP  algo¬ 
rithm  which  is  to  stop  when  £(u*'*’*)  =  Z,*(u*'*’*)  in  Step  3. 

The  CP  algorithm  was  first  introduced  by  Cheney  and  Goldstein  (1959)  and 
Kelly  (1960).  Dantzig  and  Wolfe  (1960)  developed  a  related  algorithm  called  the 
column  generation  technique  in  the  context  of  decomposing  large  scale  linear  pro¬ 
grams.  Column  generation  was  later  generalized  to  solve  Lagrangian  dual  prob¬ 
lems  for  mathematical  programs  (see,  Dantzig,  1963  and  Magnanti  et  al.,1976) 
and  was  given  the  name  generalized  linear  programming  technique.  Regardless  of 
the  terminology,  it  is  well  known  (see,  e.g.,  Dantzig,  1963,  Magnanti  et  al.,  1976 
and  Zangwill,  1969)  that  the  convergence  of  the  CP  algorithm  follows  from  the 
monotonicity  of  the  sequence  {te*'}  or  {L^~*(u*)}.  However,  the  corresponding 
sequence  of  dual  function  values,  {£(u*)}  is  not  necessarily  monotonic.  Therefore, 
the  CP  algorithm  is  a  variant  of  the  generic  algorithm  which  has  a  linear  master 
problem  and  does  not  attempt  to  descend  the  dual  function. 

The  cutting  plane  algorithm  with  Line  Search  (CPLS):  In  an  effort  to 
force  the  CP  algorithm  to  descent  the  dual  function,  Hearn  and  Lawphongpanich 
(1989b  &  1990)  added  a  line  search  step.  CPLS  can  be  obtained  from  the  generic 
algorithm  by  setting  =  oo  for  all  k  and,  in  Step  3.  letting 

v''*'  =  arg  min  (£(n‘  +  A(n*+'  -  ..*))) 

0^ 

where  A„p  =  max{A  :  u*'  -I-  A(u*''*'‘  —  tA)  ^  {’),  Thus,  i^+i  minimizes  C{u)  along 
the  direction  d*  =  —  v^.  Hearn  and  Lawphongpanich  (1989a)  showed  that, 

if  £(u)  is  differentiable  at  v*,  then  d*  is  a  descent  direction  and  £(t;*'''‘)  <  £(iA). 
Therefore,  CPLS  is  a  variant  of  the  generic  algorithm  which  has  a  linear  master 
problem  and  attempts  to  descend  the  dual  function,  i.e.,  a  descent  is  guaranteed 
whenever  the  dual  function  is  differentiable  at  the  current  iterate,  i>*. 


The  bundle  methods:  As  in  CPLS,  the  main  thrust  of  the  bundle  methods, 
first  introduced  by  Lemarechal  (1974.  1975)  and  Wolfe  (1975),  is  to  generate  a 
monotonic  sequence  of  dual  function  values.  From  the  generic  algorithm,  one  can 
obtain  a  version  of  the  bundle  methods  by  setting  c*  <  cc  for  all  k  and,  in  Step 
3,  letting 


,*:+I 


■(: 


*+i 

k 


if  £(u''+‘)  -I-  m(L*(tA)  -  L*(u*+i))  <  £(u*) 
otherwise 


where  m  £  (0,1).  Other  methods  for  determining  v*'*''  exist  and  they  can  be 
found  in,  e.g.,  Auslender  (1987),  Fukushima  (1984),  Gaudioso  and  Monaco  (1982), 
Kiwiel  (1985  &  1989),  Lemarechal  (1989)  and  MifBin  (1977).  Also,  note  that 
updating  is  in  essence  choosing  the  prox-center  for  the  next  iteration. 

Several  authors  (e.g.,  Fukushima,  1984,  Kiwiel,  1989  and  Lemarechal,  1991) 
have  observed  the  similarity  between  proximal  minimization  and  bundle  methods. 
However,  it  is  interesting  that  the  developments  of  the  two  types  of  algorithms 
appear  different.  In  an  effort  to  unify  the  development  of  bundle  methods  and 
proximal  minimization.  Section  4  provides  a  convergence  proof  for  a  simple  method 
for  updating  which  is  different  from,  but  related  to,  the  one  shown  above. 

When  u*'"*’*  =  u*'"*'*,  the  iteration  is  called  a  ‘serious’  step.  Otherwise 
(i.e.,  =  i’^),  it  is  called  a  'null’  step.  So,  after  every  serious  step,  the  dual 

function  decreases  and  bundle  methods  change  the  prox-center.  Since  <  oo,  the 
master  problem  for  the  bundle  methods  is  quadratic  (see  problem  MP  or  MD). 
Therefore,  any  bundle  method  can  be  considered  as  a  quadratic  counterpart  of 
CPLS  since  it  has  a  quadratic  master  problem  and  attempts  to  descend  the  dual 
function,  in  that  it  decreases  the  dual  function  at  every  serious  step.  To  emphcisize 
the  fact  that  bundle  methods  are  variants  of  the  generic  algorithm,  we  also  refer 
to  them  as  proximal  minimization  algorithms  with  subgradient  bundles  (PMSB). 

A  Proximal  minimization  with  cutting  planes  (PMCP):  Setting  Ck  < 
oo  and  always  letting  in  Step  3  produces  a  variant  of  the  generic 

algorithm  which  has  a  quadratic  master  problem  and  does  not  attempt  to  descend 
the  dual  function.  Note  the  PMCP  is  similar  to  the  bundle  methods  because 
both  have  a  quadratic  master  problem;  however,  it  is  different  because  it  changes 
the  prox-center  after  every  iteration  instead  of  after  a  serious  iteration.  In  the 
framework  of  the  generic  algorithm,  PMCP  is  a  quadratic  counterpart  of  the 
cutting  plane  algorithm,  for  they  both  do  not  attempt  to  descend  the  dual  function 
and  one  hcis  a  linear  miister  problem  and  the  other,  quadratic. 

As  mentioned  earlier,  the  convergence  of  the  CP  algorithm  does  not  require 
any  monotonicity  of  the  dual  function  values.  On  one  hand,  it  is  curious  that 
an  algorithm  can  converge  without  any  attempt  to  decrease  the  dual  function 
directly.  On  the  other  hand,  the  convergence  of  the  CP  algorithm  confirms  that 
decreases  in  the  cutting  plane  approximating  function  sufficiently  insures  that  the 
dual  function  eventually  converges  (not  necessarily  in  a  monotonic  manner)  to 
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the  optimal  value.  The  convergence  proof  for  PMCP  in  the  next  section  further 
corroborates  this  hypothesis. 

Table  1  below  sununarizes  the  relationships  among  the  four  algorithms  which 
use  cutting  planes  to  approximate  the  objective  function.  Recall  that  the  phrase 
^attempt  to  descend  the  dual  function'  is  to  indicate  that,  although  none  of  the 
four  algorithms  guarantees  a  decrease  in  the  dual  function  at  every  iteration, 
some  make  an  attempt  to  decrease  the  function  in  each  one.  In  particular,  the 
bundle  methods  only  yield  a  decrease  at  every  serious  step  and  CPLS  yields  one 
whenever  the  dual  function  is  differentiable.  Nevertheless,  all  is  proven  to  converge 
to  a  solution  of  D. 


Master  Problem 

Attempt  to  Descend  the 

Yes 

Dual  Function 

No 

Linear  (c/t  =  oc) 

Quadratic  (cjt  <  oc) 

CPLS 

CP 

Bundle  methods  or  PMSB 

PMCP 

Table  1;  Classes  of  algorithms  which  use  cutting  planes 
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3.  Convergence  of  PMCP 

Below,  we  restate  more  concisely  the  generic  algorithm  as  specialized  to  the  prox¬ 
imal  minimization  algorithm  with  cutting  planes. 

A  PROXIMAL  MINIMIZATION  ALGORITHM 
WITH  CUTTING  PLANES  (PMCP) 

Step  1:  Select  €  U.  Set  k  =  \  and 

=  arginax{/(j)  -f  ■  i?(x)}. 

Step  2:  Solve  the  master  problem 

=  arg  min{L*’'(ti)  -|- 

2cfc 

If  =  u*,  slop  and  u*'  is  an  optimal  solution. 

Step  3:  Solve  the  subproblem 

=  argmax{/(x)  -h  u'-''*’*  •  ^(x)} 

xe.A 

Increment  fc  by  1  and  go  to  Step  1. 

First,  note  that  since  i’"'*''  always  equals  to  the  variable  v  is  not  needed 
and  has  been  eliminated  from  'he  above  algorithm.  Then,  recall  that  in  Step  2 
L*(u)  is  convex  and  defined  previously  as 

L'’{u)  =  max  {/(x')  +  u  ■  ^(x’)}. 

1=1, 

In  Step  3,  x^'"*'^  satisfies 

^  ^  ,/+l  .  ^  £(„*=+!).  (3) 

The  theorem  below  validates  the  stopping  rule  in  Step  2. 
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Theorem  1.  7f  =  u’’,  then  is  an  optimal  solution  to  problem  D. 


Proof;  Consider  the  cutting  plane  representation  of  the  master  problem  at  the 
iteration. 

min  w  +  ;^||n  — 

2cit 

s.t. 

>  f{x')  +  u-g{x'),  i  = 

Au  <  b 

Then,  where  it’*'*'*  =  is  an  optimal  solution.  Since  u*'’*"'  = 

u*,  it  follows  from  (3)  that 

1^*+*  =  =  L'^iu'^)  =  £(u*=). 

In  addition,  the  KKT  conditions  are  necessary  at  (u’*'"*'*, u*')  and  there  must 
exist  vector  tt  and  A  satisfying  the  following  equations: 

^  =  0 
«€/'  j€J' 

1 

<€/' 

^11  A_,  >  0  Vie  I'  and  j  G  J' 

where 

a-'  =  the  row  of  matrix  A, 

/'  =  {i  :  u’*'*"*  = /(x')  +  u*"  -  5(x’)  for  J  =  1, . . . ,  A:}  and 
J’  =  {;  :  a-' -u*  =  6j  for;  =  l,...,p}. 

Since  =  C{u'’),  g{x'),  V  t  G  /',  are  subgradients  of  >C(u*)  and 

/f(p(x’)  :  :•  €  /')  C  a£(u‘) 

where  H{-)  denotes  a  convex  hull.  Thus,  the  KKT  conditions  can  be  written  more 
compactly  as 

0  G  dL{u'^)  +  X)  o' A,. 
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However,  this  is  the  KKT  condition  for  problem  D.  Since  C{u)  is  convex  and  U 
is  a  polyhedron,  the  condition  is  sufficient  and  the  proof  is  complete.  □ 

By  the  above  theorem,  if  PMCP  stops  after  a  finite  number  of  iterations,  it 
must  stop  at  an  optimal  solution.  When  PMCP  generates  an  infinite  sequence, 
it  is  sufficient  to  show  that  PMCP  converges  to  an  optimal  solution  for  the  case: 
Cfc  =  c  >  0  V/:.  (This  is  true  because  of  the  assumption  that  liminffc_ooCfc  >  0.) 
To  do  so,  define  the  following: 

X°°  =  . .},  i.e.,  the  set  of  x'  generated  by  Steps  1  and  3  of  PMCP. 

[-Y®®]  =  the  closure  of  Note  that  [A'^j  C  A'. 

T°°(u)  =  max:,g(x«]{/(i)  +  u -£[(1)).  • 

From  the  above  description,  it  is  clear  that 

L'^iu)  <  L°^{u)  <C{u)  for  it  =  1,2,... 

where  the  first  inequality  follows  from  the  fact  that  {x'  :  i  =  C  [A'’°°] 

and  the  second  inequality  from  the  fact  that  (A'*’]  C  X.  Observe  also  that  for 
any  k 

=  £(u'')  =  /(x'')  +  •  ^(x'')  V  j  =  0, 1, 2, . . .  (4) 

Similarly,  since  x^  £  [A^],  the  following  must  hold 

L^iu'^)  =  C{u'‘)  Vitoo.  (5) 

Moreover,  is  a  sequence  of  continuous  convex  function  which  converges 

pointwise  to  L°°{u).  However,  since  {L'‘{u)}k  is  also  monotonic,  it  must  also 
converge  uniformly  to  L°°{u)  (see.  Theorem  7.13  in  Rudin,  1976). 

To  prove  convergence  and  obtain  a  solution  to  D,  define  a  sequence  as 

follows:  let  =  £(u*)  and  for  k  =  1,2,3,...  let 

^  I  if  -  '‘‘f  <  •* 

(  2*  otherwise 

where  m  €  (0,1).  Also,  we  have  from  (4)  that  £(u^‘''^)  =  (u*"^^).  So,  com¬ 

puting  2*  requires  no  extra  effort.  Next,  construct  an  index  set  A  as  follows 

K,  =  {k:  2*+‘  =  £(«''+*).} 
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In  words,  K  is  the  index  set  of  iterations  in  which  there  is  a  sufficient  decrease  in 
the  dual  function,  i.e.,  by  an  amount  —  u*|p.  The  next  two  results  address 

the  convergence  of  PMCP  which  /C  is  an  infinite  set. 

Lemma  2.  Let  K  be  an  infinite  set.  If  a  subsequence  converges  to  u°° 

for  some  K  CK,  then  ^^so  converges  to  u°°. 

Proof:  Consider  the  sequence  By  definition,  it  is  a  nonincreasing  sequence 

which  is  bounded  below  by  £(u*).  Thus,  {2*}*  must  converge.  Since  K  C  IC,  the 
following  must  hold  for  all  k  €  K 

2c 

2c 

Taking  the  limit  as  k  —*  00  and  k  6  K  yields  that 

lim  ^  II  112  =  0. 

Since  both  m  and  c  are  positive,  and  must  have  a  common 

limit  point,  u°°.  □ 

Theorem  3.  IfK  is  an  infinite  set,  then  every  limit  point  of  the  sequence  {u*}fcg<; 
is  a  solution  to  D. 


Proof:  Let  u*  be  a  solution  to  D  and  lim^g/v'  u*'"  =  u°°  for  some  K  C  )C.  Since 
solves  the  master  problem  in  Step  2,  the  following  must  hold 

L''(u*+*)  +  -  u''|l2  <  L''(n)  +  i||u  -  u*'||2  W  u  E  U  k  k.  (6) 


r  any  a 

€  (0,1),  setting 

u  = 

:  au 

■  +  (1  -q)u*'+* 

in  (6)  gives 

L^u*'+^ 

)  + 

-  u* 

11^ 

< 

L*'(au*  +  (1  - 

a)u*‘*'^)  + 

^ilQu*  +  (l  - 

-  —  u 

)  + 

-u* 

ir 

< 

qLV)  +  (1- 

■a)LV^') 

;^l|a(u*  -  u* 

)  +  (l-a)(^ 

L''(u*+^ 

)  + 

—  u* 

11^ 

< 

qL''(u-)  +  (1  - 

■  q)L^(u''+^) 

2c' 


|o(“‘ - + 11(1  -  a)(x‘*' -  u')!!) 
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aL^(u*+')  <  aL*'(u*)  -  -  u^'lp  + 

i(H„--u‘)i|  +  ||(l-c)(u‘«-n‘)||)= 

ai*(ti‘+')  <  Q£(u-)-i||u‘+‘-ti*||=  + 

l(||a(u-  -  t,‘)||  +  11(1  -  a)(u‘*-  -  »‘)||)=  (7) 

where  the  second  inequality  follows  from  convexity  of  I*(u),  the  third  from  tri¬ 
angular  inequality  and  the  last  from  the  fact  that  Z/*'(u'’)  <  £(u*).  Since  L^{u)  is 
continuous  for  all  j  =  1,2,. . .  and,  from  Lemma  2,  —  u*))  — >  0  for  fc  G  K, 

there  must  exist,  for  any  £  >  0,  a  sufficiently  large  ki  such  that  for  any  j 

-  L^{u^)\  <  £,  V  )k  6  K  and  k  >  or, 

U{u’‘)  -  £  <  -f-  £,  V  it  G  K  and  k  >  k^. 

Setting  j  =  k  and  using  (^3),  i.e.,  yield  the  following 

£(u*=)  -  £  <  <  C{u^)  +  £,  V  it  G  K  and  k  >  itj. 

Combine  the  left  inequality  with  (7)  to  obtain  that 

q(£(u'')-£)  < 

Q£(u*)-i||u'=+i-u''|p-f 

l(l|a(u*  -  u'')l|  +  j](l  -  a)(u"+‘  -  V  k  G  K  and  k  >  k, 

Ic 

Take  the  limit  as  k  —*  oo  and  k  G  A'  and  obtain 

a(£(u-)-£)  <  a£(u-)-f  l||a(u‘-u-)ll^ 

£(tx~)-£  <  £(u*)-h||K-u-ir 

£(u~)-£(u*)  <  ^||(u*-u°°)||^-f£.  (8) 

Since  (8)  holds  for  any  q  G  (0, 1)  and  £  can  be  chosen  arbitrarily  small,  it  must 
be  true  that 

£(u<”)  -£(u*)  =  0, 

Thus.  u°°  is  a  solution  to  D.O 
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An  immediate  consequence  of  Theorem  3  is  that  the  entire  sequence 
converges  to  the  optimal  solution  when  D  has  a  unique  solution  (see,  e.g.,  Bazaraa 
and  Shetty,  1979). 

Consider  now  the  case  when  K  is  finite.  Define  i  =  max{fc  :  G  AC}  + 1.  Then, 

y  k  >  i,  and 

+  =  V-.  (9) 

2c 

Lemma  4.  Let  AC  be  a  finite  set  and  i  be  as  defined  above.  Then, 

liminf  =  0. 


Proof:  Assume  otherwise,  i.e.,  there  exists  a  >  0  such  that 


liminf  >  S. 

k>e  "  '  ~ 

In  other  words,  for  a  sufficiently  large  ki  >  i, 


-  u 


k\i2 


>  h,  yk  >  ki 


(10) 


(11) 


From  Theorem  3,  setting  u  =  u*  in  (6)  produces  the  following 
L''(u''+i)  +  ^l|u^+‘  -  u'--|p  <  L^{u^)  V  k. 


(12) 


Since  {L^(u)}*  converges  uniformly  to  L'”(u).  there  must  exist  for  every  £  G  (0,  |) 
a  sufficiently  large  k^  such  that 


|L*(u)  —  L^(i/)|  <  —  V  L  >  L'2  and  u  G  f/,  or 
4c 

L^{u)  -^<  L^(u)  <  L^{u)  +  -^  yk>k2  and  u  £  U. 
4c  4c 


Combining  (12)  and  (13)  yields 


L~(u^+i)-i-  +  ±||u''+i-u''|!2  <  L~(u'')  +  |-  yk>k2 


1 


)  +  ;r{ll«  II  -s)  <  L^i^  )  yk>k2 

2c 


(13) 
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(14) 


Using  (5),  we  must  have  that 

-  £}  <  -C(u^)  Vit  >  k. 

However,  (11)  and  (14)  imply  that  the  subsequence  {/^(u*)})t>t,  where  k  = 
max(fci,  ^2)5  is  a  monotonically  decreasing  sequence  and  bounded  below  by  C{u’). 
Therefore,  {/i^(«*)}fc>i  must  converge  and 

ii"?  ^{11“*'''’' -  -  ^}  =  lirn(£(u'''''^)  -  £(u*'))  =  0 

k^k  k^k 

lim||u*+‘ -u^l|2  =  £. 
k>k 

Since  £  can  be  chosen  arbitrarily  small,  this  contradicts  (10).  □ 

The  above  lemma  implies  that  there  exists  a.  K  C  {k  :  k  >  t)  such  that 

lirn.l|ii^+‘  =  0. 

«€a 

Since  U  is  compact  and  6  U  for  all  k,  there  must  also  exists  a  K'  C  K  such 
that  {u*}fcgK--  converges  to,  say,  u^.  As  a  consequence,  {u'’'^^}keK'  must  converge 
to  u®®  as  well. 


Theorem  5.  If  is  finite,  then  is  a  solution  to  D. 


Proof:  Based  on  the  preceding  discussion,  there  must  exist  a  K  C  {k  :  k  >  C} 
such  that  the  following  conditions  hold 

1.  limieK"  —  u^ll^  =  0. 

2.  {u''}fceA'  u~. 

3.  K+Mfceic u°°. 

From  Theorem  3,  setting  u  =  qu*  +  (1  -  in  (6)  for  any  a  e  (0, 1)  gives 

+  (1  _  a)u''+') + 

^]|au*  +  (1  —  q)u*''''  —  u^ll^. 
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Using  the  same  argument  as  in  Theorem  3  with  the  index  set  K,  it  can  be  shown 
that 

£(u“)  =  £(u*).  (15) 

Similarly,  setting  u  =  au^  +  (1  —  in  (6)  for  any  q  6  (0, 1)  gives 

-  u*||2  <  +  (1  -  a)u*+') + 

+  (1  -  -  u‘'|p, 

and  by  the  same  reasoning  it  must  follow  that 

£(u°“)-£(u0<0or£(u~)<£(u^).  (16) 

However,  from  (9)  it  is  true  that 

£(u''+')  +  -  u^\\-  >z^.  ^k>i 

2  c 

Take  the  limit  as  A:  — +  oo  and  k  G  K  and  invoke  the  continuity  of  £(u)  to  obtain 
that 

£(.,“)>='  =  £(u')  (17) 

Combining  (15),  (16)  and  (17)  yields 

£(u~)  =  £(u-)  =  £.u’). 

So,  must  be  a  solution  to  D.O 

In  addition  to  the  above  convergence  results,  if  /(x)  and  g{x)  are  linear  func¬ 
tions  and  X  is  a  bounded  polyhedral,  then  x*’*"'  in  Step  3  can  be  restricted  to 
extreme  points  of  X,  for  which  there  are  finitely  many.  In  which  case,  there  must 
exist  a  sufficiently  large  i  such  that  L^(u)  =  L^{u)  V  fc  >  £  and  u  ^  U.  How¬ 
ever,  this  implies  that  after  £  iterations  PMCP  reduces  to  the  application  of  the 
proximal  minimization  algorithm  to  the  following  linear  program: 

min  w 
s.t. 

w  >  f{x')  +  u-g{x')  i  = 

^lu  <  6 

Then,  it  follows  from  Exercise  4.3  in  Bertsek^ls  and  Tsitsiklis  (1989)  that  PMCP 
terminates  finitely  when  D  is  the  dual  of  a  linear  program,  or  equivalently,  £(u) 
is  piecewise  linear. 
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4.  A  Bundle  Method 


Below,  we  describe  a  particular  variant  of  the  bundle  methods  which  uses  a  dif¬ 
ferent  scheme  for  updating  in  Step  4  of  the  generic  algorithm.  For  later 
reference,  we  call  this  variant  a  proximal  minimization  algorithm  with  subgradient 
bundles  (PMSB).  One  intention  of  this  section  is  to  present  a  convergent  proof  for 
PMSB  which  uses  analysis  similar  to  that  of  PMCP,  thereby  making  the  relation¬ 
ship  between  bundle  methods  and  proximal  minimization  more  concrete.  Also,  it 
should  be  noted  that  some  variants  of  the  bundle  methods  require  a  line  search 
step  (see,  e.g.,  Fukushima,  1984,  Gaudioso  and  Monaco,  1982  and  Kiwiel,  1985). 
However,  PMSB  as  stated  below  does  not  require  any  line  search. 

A  PROXIMAL  MIXIMIZ.ATION  ALGORITHM 
WITH  SUBGRADIEXT  BUNDLES  (PxMSB) 


Step  1;  Select  u'  E  U  and  m  such  that  0  <  m  <  1.  Set  ^  =  l,i''  =  and 

j‘  =  argmax{/{x)  -f  •  g{x)}. 
ie.\ 


Step  2:  Solve  the  master  problem 

—  arg min{L^’(u)  4-  - — Iju  — 

uct-'  2cif 

if  =  v^,  stop  and  is  an  optimal  solution. 

Step  3:  Solve  the  subproblem 

=  arg  max{/(x)  -|-  •  5r(x)} 

iC.V 


Note  that  C{ 
Step  4;  Set 


V 


fc+i  _ 


r  if  -u‘||' <£(u^) 

1  i’*  otherwise 


(18) 


19 


Recall  that,  when  =  u*'"*'*,  iteration  k  is  called  a  ‘serious’  step  or  iteration. 

Otherwise,  =  u*),  it  is  called  a  ‘null’  step.  In  addition,  the  updating  formula 

for  in  Step  4  is  also  related  to  the  one  in  Section  2  (see  also  Lemarechal,  1991 ) 
which  is 

„w.  if  £(“**') +  '"(iV)-iV*'))<£(»‘)  „Q1 

”  -  \  t;*  otherwise 

To  obtain  the  relationship,  observe  that  since  is  a  solution  to  the  master 
problem 

£(u''+') +  m2^||u*+> -y'^lP  <  £(u*+*)  +  m(L*(y*)-I*(u*+*)) 

So,  the  updating  formula  (19)  implies  (18). 

When  PMSB  terminates  finitely.  Theorem  1  in  the  previous  section  still  guar¬ 
antees  that  y*  is  an  optimal  solution  to  D.  Below  are  convergence  results  for 
the  case  when  the  algorithm  generates  an  infinite  sequence.  As  in  Section  3,  it  is 
assumed  without  loss  of  generality  that  c^  =  c  >  0,  V  k,  and  let 

K  =  {k:  y*+‘  = 

So,  K  is  the  inde.x  set  for  the  serious  steps  (iterations). 

Lemma  6.  Let  K  be  an  infinite  set.  If  a  subsequence  converges  to  v°° 

where  K  C  fC,  then  ^Iso  converges  to  v^. 

Proof:  Note  that  the  sequence  {£^y*')}fc  is  a  nonincreasing  sequence  which  is 
bounded  below  by  £(u*).  So,  {£(u*)}jt  must  converge.  Since  K  C  /C,  the  following 
must  hold 

£(„fc+i)  +  <  £(1,'=)  ykeK 

Ic 

Following  the  same  argument  in  Lemma  2,  it  can  be  shown  that 

0  =  lim  -  y^ll^ 

kCl<  Ic  " 

Since  both  c  and  m  are  positive,  must  converge  to  y°°.0 
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Theorem  7.  If  the  cardinality  of  1C  is  infinite,  then  every  limit  point  of  the 
sequence  Is  a  solution  to  D. 


Proof:  Let  be  a  convergent  subsequence  where  K  C  1C.  Since  = 

'i  k  ^  K  and  u*’*"'  is  optimal  to  the  master  problem,  the  following  must  hold 

-  u^ll^  <  L\u)  +  j-Ju  -  \fueUtkeK. 


Using  the  same  analysis  as  in  Theorem  3  and  the  result  for  Lemma  6,  it  can  be 
shown  that  {u*}fceA'  converges  to  u’.D 


When  the  cardinality  of  AC  is  finite,  define  tis  before  f  =  max{A:  :  fc  6  AC)  +  1. 
So,  every  iteration  k  >  f  must  be  a  null  step  and  the  master  problem  in  Step  2 
must  have  the  form: 

=  argmax{Z.''(u) -f  T||u  _  V  A:  > 

Next,  let 

F*'(u)  =  !*•■(;()  + and 

F^{u)  =  L^[u)  +  l\\u-v‘r- 


Then,  F^{u)  epi-converges  to  F°°{u)  since  L'‘{u)  pointwise  and  monotonically 
converges  to  L°°{u).  (For  the  definition  and  properties  of  e pi-convergence,  see, 
e.g.,  the  appendix  in  Wets,  1989.)  In  addition,  it  follows  from  Theorem  A. 2  of 
Wets(1989)  that  if  for  some  1\  C  {1,2,3 _ }.  then 

u°^  =  argmin  F'^(u). 

°  u€f 


Furthermore,  since  L*'(-)  also  uniformly  converges  to  L^(  ),  there  must  exists, 
for  any  £  >  0,  a  sufficiently  large  ki  such  that 

lL*(u)-L"°(u)l  <£  Vu  €  F  and  it  >  it,, 

and  by  setting  u  =  u*"*"* 


< 

e 

V  it  >  it. 

< 

c 

V  it  >  it. 

£(u*+*)-£  <  L''(u*+‘) 

< 

+  £  Vit>it, 

(20) 

where  the  middle  inequality  follows  from  (5). 
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Theorem  8.  If  the  cardinality  of  K  is  finite,  then  is  a  solution  to  D. 


Proof:  Since  U  is  a  compact  set,  there  must  exist  a  set  K  C  {k  :  k  >  i]  such 
that  converges  to  u^. 

If  u°°  =  then  the  above  observation  concerning  the  epi-convergence  of 
F*(u)  implies  that 


£(i/)  =  !-(/) 

= 

=  min{Z,^(ii)  +  ~ 

net' 

<  C{v^) 

where  the  first  equation  follows  from  (5),  the  third  from  the  observation  that 
u'^  =  argminugt' the  fourth  from  the  fact  that  L°°{u)  <  C{u)  u  €.  U, 
and  the  last  from  the  fact  that  is  an  element  of  U.  Thus, 

=  min{£(u)  +  ^llu  -  v'lp}. 

However,  this  implies  that  solves  D. 

Assume  that  u*  ^  v^.  Let  6  =  ||u*  —  Then,  there  must  exists  a 

sufficiently  large  /tj  such  that 

^  V  it  >  ii-2 1  k  e  K  (21 ) 

However,  since  u*"*"'  solves  the  master  problem  in  Step  2  with  as  its  prox-center, 
it  must  be  true  that 

+  ±l|u*+l  _  i/||2  <  =  £(v^),  V  it  >  f, 

where  the  equality  follows  from  (4)  in  Section  3.  For  any  £  >  0,  let  ki  be  as  in 
(20)  so  that 

£(u''+*)  -  £  +  <  £(i/),  V  it  >  max(£,  it,). 


Set  £  =  and  obtain 

4c 


-  ^^11^  -  (1  -  ^)|)  <  ^  >  max(£,  fci). 

Then,  for  any  k  >  max(£,  ki,  k^)  and  k  6  K,  (21)  implies  that 

C(v‘)  >  £(u‘«)  +  l(||u*«-.,'||^-(l-m)i) 

>  f  (||.<‘+'  -  -'f  -  (1  -  m)||u‘«  -  r'f) 

Ic 

Jc 

However,  this  implies  that  there  must  be  a  serious  step  after  iteration  i  which  is 
a  contradiction.  Thus,  every  convergent  subsequence  of  {u'‘'^^]k>t  converges  to 
which  is  a  solution  to  D.  However,  this  implies  that  the  sequence 
converges  to  as  well.  □ 
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5.  Conclusion 


This  paper  presents  a  generic  algorithm  in  the  framework  of  proximal  minimiza¬ 
tion.  It  is  shown  that  this  generic  algorithm  can  be  specialized  to  four  different 
algorithms;  they  are  the  cutting  plane  algorithm,  the  cutting  plane  algorithm 
with  line  searches,  the  bundle  methods  (or  proximal  minimization  with  subgra¬ 
dient  bundles,  PMSB)  and  proximal  minimization  with  cutting  planes  (PMCP). 
The  first  three  can  be  found  in  the  current  literature;  however,  the  last  one  is  new. 

Besides  the  obvious  relationship  that  all  four  algorithms  can  be  derived  from 
the  generic  algorithm,  other  relationships  based  on  the  master  problem  and  con¬ 
vergence  behavior  are  also  established.  Convergence  proofs  for  PMSB  and  PMCP 
are  also  given. 


* 
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